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ABSTRACT 


Maritime Interdiction Missions (MIM) are of great interest and high operational 
importance to the U.S. Navy, the U.S. Coast Guard, and allied forces. The MIM scenario 
discussed in this thesis includes an area of interest with multiple neutral and hostile 
vessels moving through this area, and an interdiction force consisting of an unmanned 
aerial vehicle (UAV) and an intercepting vessel, whose objectives are to search, identify, 
and intercept hostile vessels within a given time frame. In this thesis, we develop 
Stochastic Dynamic Programming models, which represent the MIM scenario. While a 
theoretical method of producing an optimal decision policy for the interdiction force is 
presented in this thesis, it is shown that such computation is intractable. The models 
developed in this study are used to analyze and evaluate the performance of a heuristic 
decision policy that we recommend to be applied by the interdiction force. Based on a 
numerical case study, which includes several representative MIM scenarios, we show that 
the number of intercepted hostile vessels following the heuristic decision policy is at least 
60% of the number of hostile vessels intercepted following the optimal decision policy. 
Based on the results of the heuristic performance in the numerical case studies, we 
recommend the implementation of our suggested heuristic in an operational decision aid 


for Maritime Interdiction Missions. 
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EXECUTIVE SUMMARY 


Maritime Interdiction Missions (MIM) are of great interest and high operational 
importance to the U.S. Navy, the U.S. Coast Guard, and allied forces. The MIM scenario 
discussed in this thesis includes an area of interest (AOJ) with multiple neutral and hostile 
vessels moving through this area, and an interdiction force consisting of an unmanned 
aerial vehicle (UAV) and an intercepting vessel, whose objectives are to search, identify, 
and intercept hostile vessels within a given time frame. The goal of this thesis is to 
optimize the operational policy used in employment of such interdiction force. We 
discuss why finding the optimal decision policy for the interdiction force is intractable, 
while suggesting a sub-optimal yet effective and practical heuristic decision policy, 


which performance does not fall by much behind the optimal policy. 


Scenario and Concept of Operations 


The MIM scenario discussed in this thesis includes multiple moving neutral and 
hostile vessels and an interdiction force comprising two assets: a UAV that detects 
identifies, and tracks suspected vessels and a navy vessel that intercepts suspected vessels 
indicated by the UAV. The goal of the interdiction force is to intercept as many hostile 
vessels as possible in a given time frame. We assume the existence of some prior 
intelligence regarding the expected movement patterns of both neutral and hostile vessels. 
The UAV moves around the AOI while searching for vessels. Once a vessel is detected, 
the UAV tracks that vessel for a given time period, while attempting to identify that 
vessel as either a neutral or a hostile one. The UAV identification is accomplished by 
utilizing a sensor (e.g., electro-optical or radar sensors) to recognize the physical 
signature of that vessel and its movement pattern. Once the UAV has flagged a vessel as 
a suspicious one, the interception vessel is called in to physically intercept that suspicious 


vessel. 


XIX 


Methodology and Models used 


We developed a Stochastic Dynamic Programming model, which represents a 
system of neutral and hostile vessels together with an interdiction force operating inside 
the AOI. This model is used to optimize the operational policy of the interdiction force in 
this MIM scenario. The model includes the states of the system, the decisions the 
interdiction force is required to make, the information obtained by the interdiction force 
following such a decision, the manner by which the system evolves over time according 


to the decision policy being used and the overall objective function. 


While this model can theoretically be used to find the optimal decision policy for 
the interdiction force, we show why this problem is intractable and cannot be 
implemented in a real-world operational scenario. Instead, we suggest a heuristic 
approach to constructing an effective sub-optimal decision policy and present its 
performance in achieving the interdiction force’s goal. The analysis of this heuristic 
decision policy is accomplished by means of bounding its expected performance based on 


a simplified dynamic programming model of the MIM scenario at hand. 


Both of the above models—the original dynamic programming model and the 
simplified one—have been implemented in MATLAB with the aim of analyzing a 
numerical case study of several representative MIM scenarios. This numerical case study 


is used to evaluate the performance of the heuristic decision policy. 


Results 


The analysis of several MIM scenarios as part of the numerical case study, along 
with a sensitivity analysis and parametric study, has shown that the number of intercepted 
hostile vessels following the heuristic decision policy is at least 60% of the number of 


hostile vessels intercepted following the optimal decision policy (which is intractable to 





produce). Furthermore, we present an approximate optimization for one of the key 
operational parameters in the employment of the interdiction force: How often should one 
call the interception vessel to intercept suspicious vessels being tracked by the UAV. It 
has been found that when the boarding time of an intercepted vessel is negligible, it is 


XX 


best to call the Interceptor even when there is only a small probability that the tracked 
vessel is an hostile one, while when the boarding time is in the order of an hour, it is best 
to call the Interceptor only when there is at least 40% chance that the tracked vessel is a 


hostile one. 


Conclusions 


Based on the results of the heuristic performance in the numerical case studies, we 
recommend the use of this heuristic in any MIM scenario which closely resembles the 
MIM scenario discussed in this thesis. This heuristic can be effectively implemented in 


an operational decision aid for Maritime Interdiction Missions. 
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I. INTRODUCTION 


A. OVERVIEW 


Maritime Interdiction Missions (MIM) are of great interest and high operational 
importance to the U.S. Navy, the U.S. Coast Guard, and allied forces. MIM typically 
involve search, identification, and interception of suspected vessels and are the 
operational background to this thesis. We develop a stochastic dynamic-programming 
model representing a maritime area of interest with multiple moving neutral and hostile 
vessels and an interdiction force comprising two assets: an unmanned aerial vehicle 
(UAV) that detects identifies, and tracks suspected vessels and a navy vessel that 
intercepts suspected vessels. The model developed in this thesis generates optimal and 
near-optimal policies for MIM that may lead to better utilization of the interdiction force. 
The model can be implemented in a tactical decision aid and produce real-time 
recommendations for courses of actions to an interdiction force commander in MIM 


scenarios. 


B. BENEFITS OF STUDY 


The near-optimal policy for employing an interdiction force in MIM developed in 
this thesis will have operational value when implemented in a real-time decision aid. 
Moreover, the thesis is a first attempt to model combined search-interception operations 


in a stochastic and dynamic setting. 


Cc. RELATED WORK 


The field of classical search theory has been extensively studied, as discussed by 
Washburn [1] and Stone [2]. Additional work on route optimization for multiple sensors 
and resource-constrained searchers has been done by Royset and Sato [3], [4]; see also 


references therein. 


Dynamic task allocation and vehicle routing has been studied by Smith [5], who 


discusses multiple autonomous vehicles employment in dynamic environments. 
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Optimization of employment of non-reactive sensors has been addressed by Kress, 
Szechtman and Jones [6]. Probabilistic search optimization and mission assignment for 
heterogeneous agents is discusses by Chung, Kress and Royset [7]. This last study deals 
with a similar situation as in this thesis, but adopts a heuristic approach without solution 
quality estimates in terms of upper and lower bounds. This thesis develops such bounds 


for the specific situation with one UAV and one navy vessel. 
D. THESIS ORGANIZATION 


This thesis is organized in the following way: Chapter I presents an overview of 
the operational scenario and the problem at hand, while listing the main assumption used 
in this research. Chapter II presents the basic development of the dynamic programming 
models used in this thesis. In Chapter III we continue to discuss the models presented in 
Chapter II while completing the detailed development of all necessary models. Chapter 
IV discusses an analysis of a numerical case study, and presents the results and insights 
of this analysis. Chapter V summarizes this research and presents the conclusions. 
Appendix A offers a list of acronyms and symbols used throughout this thesis, while 
Appendix B presents a brief overview of the MATLAB code implemented in this 


research. 


E. SCENARIO 


We consider a maritime area of interest (AO/) that contains multiple objects of 
interest (objects) some of which are hostile objects (targets) and the remaining are neutral 
objects (neutrals). The number of targets and the number of neutrals are unknown. The 
AOI is subdivided into a number of area cells (ACs). The time is divided into discrete 
time periods. All targets are dynamic and move independently according to a known 
Markov chain defined on the set of ACs. The neutrals move likewise, but according to a 
different Markov chain. Motivated by our discretization of space and time, with 
resolution that can be arbitrarily high, and assuming that the AOI is relatively large 
compared to the (unknown) number of objects, we neglect the possibility of more than 


one object in any specific AC at any given time period. This is an approximation to the 
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real situation, which is quite reasonable in open-sea scenarios, and which makes the 
model tractable. The interdiction force’s UAV is hereafter referred to as a Recognizer and 
the navy vessel as an /nterceptor. Figure 1 presents an example of such an AOI, with 
multiple targets and neutrals arriving, leaving and moving about the AOI as well as a 


Recognizer and an Interceptor. 


The Recognizer patrols the AOI in a fashion to be determined, with the goal to 
correctly identify as many targets as possible. After arriving to a specific AC, the 
Recognizer searches for the presence of an object in that AC. We assume that the 
Recognizer has perfect detection capabilities, which means that the Recognizer can 
determine with certainty whether the AC is empty or contains an object. If there is no 
object in that specific AC, the Recognizer decides on its next AC to visit. In case there is 
an object in that AC, the Recognizer follows it for a single time period while trying to 
identify it as either a target or a neutral. After this single time period of tracking the 
object, the Recognizer has to decide whether that object is a likely target or a likely 
neutral. The Recognizer flags a tracked object as a likely target if the probability that the 
tracked object is a target is greater or equal to a predetermined threshold. If a tracked 
object is flagged as a likely target, the Recognizer then calls in the Interceptor to intercept 
the object. While waiting for the Interceptor to arrive and intercept the object, the 
Recognizer stays with the object. In this thesis we make the simplifying assumption that 
once flagged as a likely target, the object is forced to remain stationary at its location at 
the time when the Recognizer identified it as a likely target. This assumption avoids the 
need of a complicated interception model, which would have made our scenario more 
realistic, yet would needlessly complicate the model if one is not concerned about the 


interception “end-game.” 





Figure 1. An example of an AOI with multiple neutrals (4) and targets (7), a 
Recognizer (R) and an Interceptor (/). 


The Interceptor’s only task is to intercept objects identified to be likely targets. 
The interceptor intercepts the suspicious target (still being tracked by the Recognizer) and 
identifies it as a (real) target or a neutral. The Interceptor has perfect identification 
capability; it can distinguish with certainty between a target and a neutral. While waiting 
for such tasks, the Interceptor is stationary at a certain location inside the AOI (the 
Interceptor only moves when called in to intercept a likely target). The goal of the 
interdiction force is to intercept as many targets as possible within a given time horizon, 
while possibly taking into account discounting of intercepted targets (the discounted 
value of an intercepted target is called a reward). If the Recognizer identifies an object as 
a likely neutral or if the Interceptor intercepts an object that turns out to be a neutral, then 
that object is “marked” as such and is of no future interest (e.g., some form of a tag or a 
beacon is placed on that object to avoid future “redetection” and tracking by the 
Recognizer). Figure 2 summarizes the concept of operations (CONOPS) used by the 


interdiction force. 










Decide where to go 





Object detected 
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Figure 2. Summary of CONOPS of the interdiction force 


F. ASSUMPTIONS 


The scenario analyzed in this thesis and the corresponding model includes several 


key assumptions, which are discussed next. 
1. Arrival and Movement of Targets and Neutrals 


We assume that all objects, targets and neutrals, independently enter the AOI, 
move inside the AOI, and then leave the AOI. The objects do not behave strategically 
and, therefore, their movement is not affected by the actions of the interdiction force, that 
is, neither the presence of the Recognizer nor the actions of the Interceptor affects the 
behavior of the objects. We also assume that the AOI is sparse enough, or the spatial and 
time resolutions are high enough, such that the probability of more than one object in an 


AC at any given time period is negligible. 


At any given time period, new objects may enter the AOI. The set of ACs to 
which objects may arrive is called the Boundary of the AOI. At any given time period, at 


most one object may enter each of the ACs on the boundary of the AOI with given 
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probabilities. The entry probability may depend on the type of object—target or neutral— 
as well as the location of the AC. ACs not on the boundary have probability 0 of an 
atrival from outside the AOI. Once inside the AOI, each object moves according to a 
known Markov process with different transition probabilities for targets and neutrals. A 
designated AC represents the area outside the AOI to which objects eventually transit 
according to the Markov process transition matrix. The arrival process into each of the 
ACs on the boundary of the AOI is Bernoulli. Note that a key assumption is that the 


likelihood of more than two simultaneous arrivals to the same AC is negligible. 


Absent interruptions by the interdiction force, the overall stochastic process 
describing the arrivals, transits and exits of objects is a large-scale Markov chain, with 


states defined by a vector (YY aot) where Y, is 1 if a neutral is present in the 
corresponding AC, Y, is 2 if a target is present in the corresponding AC, Y, is 0 if the AC 


is empty, and |AOl | is the number of ACs in the AOI. The state space of this Markov 


chain is very large (3h states), and so it is not used directly in any calculations in this 
thesis. Instead of looking at the probability of a state, we consider the marginal 


probabilities for each ACP[Y,=y,]. Our assumptions imply that Y,.¥, are 


AOl| 
independent discrete random variables. We assume that the MIM starts when the system 


represented by Y,...Y¥ is at steady-state. In subsequent chapters in this thesis, 


|AOI| 
whenever we refer to a Markov process or a Markov transition matrix, we refer to the 
movement process of some individual object and not the overall large-scale Markov 
process discussed above. We assume stationarity in the sense that neither the entry 
probabilities nor the in-AOI transition or exit probabilities depend on the time period f, 


and they do not depend on the interdiction force actions. 
Zi Sensors Capabilities 


The Recognizer has perfect detection capability, which means that, once in an 
AC, the Recognizer detects the presence of an object in that AC with certainty and there 
are no false positive detections. Following a detection, the Recognizer tracks the detected 


object for a single time period in an attempt to recognize it and determine whether it is a 
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target or a neutral. The tracking process is fault-free in the sense that the Recognizer 
never loses contact with the tracked object, however, the recognition capability is 
imperfect and the identity of the object may be subject to false negative (identifying a 
target as a neutral) and false positive (identifying a neutral as a target) errors. The 
Recognizer utilizes two recognition techniques: movement recognition and signature 
recognition. Movement recognition is based on the movement pattern of the object, 
attempting to detect anomalies. Signature recognition is based on the physical 


characteristics of the object. 


The Interceptor is assumed to have perfect recognition capability. Once the 
Interceptor has intercepted a suspicious object (after being dispatched by the Recognizer, 
which had flagged that object as a likely target) it can perfectly determine whether this 


object is indeed a target or a neutral. 
3. Separation of Detected Objects 


Once an object is flagged as a likely neutral (at the end of the Recognizer’s 
tracking phase) or once an intercepted object turns out to be a neutral, it is taken away 
from the AOI for the remainder of the time horizon. We assume that in both cases some 
form of a beacon or other tagging technique is used to prevent this specific object from 


ever being of interest for the interdiction force. 
4. Finite Time Horizon 


The time horizon is finite, which means that the interdiction force does not care 
about events that will take place past that time horizon. In particular, a successful 
interception of a target past the time horizon is considered to be worthless for the 
interdiction force. This assumption may be appropriate in an operational scenario in 
which there exists a strict deadline. In scenarios in which a strict deadline does not exist, 
this artificial time horizon is known to have some distorting effect on the results of the 


model. One can try to reduce this end-effect by using a long time horizon. 
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I. MODELS DEVELOPMENT: MAIN PARTS 


A. DYNAMIC PROGRAMMING FORMULATION 


Let A denote the set of ACs in the AOL, A,denote the area outside the AOI to 


which objects eventually transit when they leave A , and E < A denote the subset of ACs 
at the boundary of the AOI. Let ¢ denote the index of time periods. Let T denote the 
finite time horizon of the scenario. As mentioned in the Introduction, we assume that the 
grid of ACs and time resolutions are fine enough such that at any given time period there 
may be at most one arrival of a new object into each one of the ACs in EF and at most one 
object present in a certain AC in A. The objects enter the AOI and move about the AOI 


independently. 


The dynamic programming formulation used in this thesis is based on the 


conventions found in Powell [8], pp. 129-178. 


1. State 


' . te tig 
We define the state as the vector s = (t,r,i, 7,0) where ¢ is the time period, 


r € Ais the Recognizer’s location, i ¢ Ais the Interceptor’s location, z is a vector of 


probabilities with components z,,a¢A, where z, is the probability that a neutral is 
present in ACa , and@ is a vector of probabilities with components 0,,a¢A, where 


8 is the probability that a target is present in AC a. Let 


a 


\A| 


Sc {l,2,...7}x Ax Ax[0,1]" x[0,1] be the space of all possible state vectors. 


Although the two probability vectors z and @ are continuous, they only can take finite 
number of values because the number of detection and interception opportunities in a 
given time horizon is finite. Thus, our defined state-spaceS is of finite (though high) 


cardinality. 


2. Decision 


A decision x in the scenario determines the next AC to be visited by the 
Recognizer, thusx¢.A. This decision is made either at t=0 or when the existing 
decision is fathomed. A decision x € A is said to be fathomed in one of the following 
three situations: (i) no object is found by the Recognizer in AC x, (ii) an object is found 
by the Recognizer but it is recognized as a neutral, or (iii) an object is intercepted. Note 
that as a decision can only be made in one of the cases presented above (start of the 
search or when a decision is fathomed), this dynamic programming model slightly differs 
from standard models where decisions are made at a constant rate at every time step (see 
Powell [8], pp. 132—135). In fact, the time interval between two consecutive decisions is 


a random variable, which is defined in the next section. Since a decision is always made 


based on the current state, we often write x(s). In some cases, we drop the explicit 


dependency of the decision x on the state s to simplify our expressions. 


3. Information 


Let w= (ARP ae, ) denote the random vector representing the information 
obtained by the interdiction force once a decision is fathomed and the consequences of 
that decision are realized. The time-interval Ar,, denotes the time duration of the current 
state-transition, that is, the duration between the time period in which decision x is taken 
and the time period when x is fathomed. The variables 7, and 7,,denote the Recognizer’s 
and Interceptor’s locations, respectively, at the end of the current state-transition. The 


Bernoulli random variable z,, is equal 1 if a target has been intercepted and 0 otherwise. 


Note that these four random variables are not independent. Let W denote the space of 


possible realizations of w. The explicit probability distribution of w, which is a function 


of both the current state s and the decision x(s), is presented later on. To represent this 
dependency, we write w(s,x(s)) to denote the random vector w with probability 


distribution depending on the state s and the decision x(s). Our notation does not 
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distinguish between the random vector w or its components (Ar,,7,,i,,z,,) and their 


ww? 


realizations. The meaning should be clear from the context. 
4. State Transition 


The next state is determined using the _ state-transition function 


ia (s,x(s),w(s,x(s))), which is a deterministic function of the current state, of the 


decision made and of the information obtained following the realization of a decision: 


s” Sx AxW>S (2.1) 


The explicit function will be presented later. 
3. Reward 


The reward c is a function of the information w and the state s and is defined in the 


following way: 


c:WxS>R (2.2) 
-(t+Ar,,) 
Fer ie ah ; t+At,<T 
c(w(s,x(s)),s) = ; ( ) pate (2.3) 


Uy Ud 
where w=(At, Fgh 2% ) 8 =(t:751,7,0) and T is the time horizon of the scenario. 


The reward is 0 if no target is intercepted or if the time of interception is beyond the time 
horizon, and the reward is 1 if a target is intercepted earlier than the end of the time 
horizon, while introducing a discounting factor y for interception at later times. Choosing 
a discount factor y = 0 means that the same reward is collected for intercepting a target 
now or later. A discount factor vy > 0 means that the reward for intercepting a target in the 
near future is higher than intercepting it at a later point in time. Note that the reward is 
not a function of the state alone, but of the information obtained (after making a decision) 


as well. 
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6. Timeline 


We assume the following timeline (Figure 3): Once in a state s and a decision has 
just been fathomed, a new decision x(s) € A is made, after which information w(s, x) is 
obtained, and the new state s’ is derived by the deterministic function s” (3 w) _Asa 


decision is not made at every time step, and information is not obtained at every time 


step, we do not use the common notation of subscript f for s,,x, or w,. In our notation ris 


treated as any other component of the state vectors . 





Figure 3. Timeline of the dynamic programming formulation 


7. Bellman Equation 


We define the value of being in state s as the maximum expected reward from 
the time of being in state s to the end of the time horizon T . This value V (s) is given 


by a Bellman equation: 


V(s)= max £[ e(w(s.-(s)).s)+V (5! (s.2(s).0(s.2(s)))) t<T (2.4) 


0, t>T 


where f is the time of state s =(t,r,i,, 0) . This equation can be written in a simplified 


form: 


_ max {E[ e(w,s)+V(s™ (s,x,w))]}, t<T 


0, f2T 


(2.5) 


The optimal policy in a state sis the optimal solution of the above Bellman equation. 
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B. METHODOLOGY OUTLINE 


Let B denote the problem of finding the optimal policy, which maximizes the 
reward obtained in a finite time horizonT according to the states, decisions, information, 
state transition and rewards presented above. This problem can theoretically be solved by 
backward calculation of its states’ values using the Bellman equation (Equation 2.5). A 
Backward Dynamic Programming algorithm (see Powell [8], p. 50) starts from the end of 
the time horizon and backtrack to the present while calculating the value of each possible 
state at each time period. We recall (2.5) that the value of each state is the expected 
reward obtained in the next state transition plus the expected value of the next resulted 
state, which already has been calculated since we are moving backwards through time. 
This sequential process requires the calculation of each state value for each time period in 
the finite time horizon, and so it is crucial to have a small enough state space to be able to 
practically implement this algorithm and solve a real-life problem. We encounter a 
difficulty when we realize that the state spaceS has a huge number of possible states, 
mainly due to the probabilities vectors z and @, which can take numerous values. This 
calculation difficulty renders our problem intractable, and so we turn to a different 


method of producing an operational policy to be applied in the scenario. 


In this thesis, we suggest to use an easy-to-calculate heuristic instead of solving 
for the exact optimal policy, while analyzing the performance of this heuristic by 
bounding the error between this heuristic’s value and the actual optimal policy’s value. 
We construct a new problem B , named the Upper Bound Problem, characterized by the 
fact that the reward collected in this new problem is an upper bound to the reward 
collected in our original problem B. Any feasible policy in problem B is a lower bound on 
the optimal reward collected in Busing the optimal policy, and so by calculating the 
reward collected using our heuristic’s policy in problem B and solving B for its optimal 
reward we are able to bound the optimal reward of our original problem B . Furthermore, 
the difference between the optimal reward collected in B and the reward collected using 
our heuristic’s policy is an upper bound for the difference between the optimal reward 
collected in the original problem B and the reward collected using our heuristic’s policy. 
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This thesis includes the complete developing of B and B, a method of effectively 
solving B, a method of effectively calculating the reward collected using the heuristic’s 


policy, and a numerical analysis of several case-study examples. 
C. SUGGESTED HEURISTIC 


We construct a simple and greedy heuristic for the Original Problem B, defined 


as follows: 


6 
x" (s) e arg max « ——_4“#—_ 2.6 
(s) a jer chart 20) 


where s = (t, r,i,2,0) is the current state, r is the Recognizer’s current location, i is the 
Interceptor’s current location, Ts is the time it takes the Recognizer to reach AC a from 


its current locationr, 7 


i,a 


is the time it takes the Interceptor to reach AC a from its 


current location 7, and 6 is the estimated probability of a target in AC a at the time the 


Recognizer reaches AC a if the Recognizer decides to visit that AC. We add one time 
period for the Recognizer’s identification process (the time it takes to identify a detected 
object as either a likely target or a likely neutral while tracking it). This is an attempt to 
estimate a normalized value of each potential AC to visit next, by dividing the probability 
of a target being in that AC by a rough estimation of the time it will take to detect and 
intercept it. There is a chance for a tie between several ACs that maximize the above 


expression, in which case we can arbitrarily choose among them. 
D. REWARD CALCULATION USING HEURISTICS POLICY 


This reward is calculated by running forward in time in the original 
problem B while following our heuristic decision policy. The Bellman equation used in 


this calculation is: 
El e(w.s)+V" Ge (s.x",))]. t<T 


0, t>T 


(2.7) 
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which is the Original Problem’s Bellman equation without the maximization over the 


possible decisions x, but instead just choosing x with our heuristic. We use V" (s) to 


denote the value of each state following our heuristic decision policy, which is different 


than the optimal value of each state using the original Bellman equation (2.5). 
E. UPPER BOUND PROBLEM 


We define an Upper Bound Problem B that is a simplified version of our Original 
Problem B . The main difference between the two problems is that in B the interdiction 
force experiences a “situational awareness memory loss” after each decision is fathomed. 
Specifically, each time a decision has been fathomed, we “reset” the two probability 
vectors z and @ to their initial values at time t=0. Because the probability vectors z 
and @ both remain the same each time a decision is made, we can exclude them from the 
definition of the state in the Upper Bound Problem B . This simplified state space makes 
the Upper Bound Problem tractable and allows us to solve it in reasonable time. The 
above “memory loss” property also means that we allow the interdiction force to 
potentially re-collect the same rewards over and over again. In the Original Problem, 
each time the Recognizer visits an AC, we set the probability that the AC contains a 
target and the probability that AC contains a neutral to 0 (a comprehensive discussion of 
these probability updates appears in Chapter III-A.2). Our definition of the state in the 
Upper Bound Problem does not include any history of all previously visited ACs. 
Specifically, the Recognizer do not “remember” that the probabilities of a target and a 
neutral being in each visited AC should be set to 0 at the time of that visit. Explicitly, by 
not setting to 0 the probability that a visited AC contains another target once it has been 
visited, all ACs in the AOI have greater or equal probability of containing a target than 
they should truly have (as in the Original Problem). Instead of keeping track of the most 
updated z and @ vectors as being done in the Original Problem B , each time we make a 
decision we assume we are back to the initial steady-state of the probability vectors z 
and @. These steady-state probability vectors are discussed later, with explicit formulas 
appearing in (3.1) and (3.2). Having this “memory loss” property and assuming we 


always make a decision in steady-state, we are in risk of getting “trapped” in the same 
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AC (which probably have relatively high probability of containing a target) just because 
we do not take into account the fact that we have just visited this AC and as such the 
probability that another target has just entered into this AC is likely to be less than the 
high steady-state probability of a target being in this AC. In an attempt to avoid these 
unwanted cases we temporarily update the probability of another object in the AC the 
Recognizer is currently in at the time of making each decision (i.e., drop both the 
probability of a target in this AC and the probability of a neutral in this AC down to 0). 
This temporary update exists only until the current decision is fathomed. Once we 
complete the current state transition and end up in the next state we “forget” this 


temporary update and assume we are back to a stead-y-state. 


As discussed above, the Upper-Bound Problem’s state is slightly different from 

the Original Problem’s state and is defined as follow: 
s=(6,7,7)' (2.8) 
where (7.7.7) are the time, Recognizer’s location and Interceptor’s location (the same 


as in the Original Problem B ), and the probability vectors z and @ have been omitted. 


The Upper-Bound Problem’s Bellman equation can now be written: 


ma {E[ E(w.) +V(5" (5,7) ]}, t<T 


V(s)=1 80 


0, t>T 





(2.9) 


which seems almost identical to the Original Problem Bellman equation (2.5), but of 
course differ in the actual definitions and meaning of the variables and functions it 
incorporates (e.g., the definitions of the state s, the probability distribution of the 
information w, etc). We use “bar” to denote these variables and functions in the Upper- 
Bound Problem B, which differ from their counterparts in the Original Problem B. 
Another subtle difference is that the transition function 5” is not directly related to x 
but only through w, as presented later in (3.54). A more comprehensive discussion of the 


Upper Bound Problem B is presented in Chapter III.B. 
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Hl. MODELS DEVELOPMENT: DETAILS 


A. ORIGINAL PROBLEM 


if Probabilities and Object Recognition Techniques 


In this section, we present the notation and formulas describing the calculation 
and update of several variables and probability distributions needed for our model. The 
steady-state probabilities of a neutral and a target at each AC comprise the baseline at the 
beginning of the scenario (though we could start with non steady-state probability vectors 
just as well). As the scenario progresses, these steady-state probabilities will no longer be 
relevant and we will have to keep track of an updated probability map of neutrals and 
targets at each AC in the AOI. These probabilities will change as a result of the sensors’ 


observations and the Markov transition probabilities of the objects. 
a. Initial Pre-Search Probabilities: Uninterrupted Steady-State 


Let P =| P(a',a) |, a’,ae AUA, be the Markov transition matrix for 
any neutral in the AOI, with P(a',a) representing the probability of a single time-step 
transition from AC a’ to AC a. Let O= [Q(a’,a) |. a’,aeAUA, be the Markov 
transition matrix for any target in the AOI, with Q (a’, a) representing the probability of a 
single time-step transition from AC a’ to AC a. Let @, denote the single time step 


probability of a neutral entry to AC /<E, and let f denote the single time step 


probability of a target entry to AC EE. 


Absent any other information (such as sensor’s observations), and based 
on our assumption that the probability of more than one object in a cell is negligible, the 


steady-state probability of a neutral in AC a € Ais approximated by 
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1-(1-a@,) [](1-@,P*(i.a)), acE 


m = leE k=l (3.1) 


1- [](\-@,P*(.a)), age 


where P* (I a) is the (/,a) entry of P‘—the transition matrix P raised to the kn power, 


and the superscript 0 denotes steady-state probabilities. Similarly, the steady-state 


probability of a target in AC a € A is approximated by 


a ae 


0° = leE k=l (3.2) 


-TITT (1-40! (.2)), agE 


leE k=l 
Based on our assumptions, we identify three mutually exclusive and 
collectively exhaustive events associated with an AC a € A: (i) void of objects, denoted 


V_, (1) contains one neutral, denoted NV, and (iii) contains one target, denoted 7). The 
event O, denote a situation where there is an object, of any kind, in AC a, that is, 


O,=N, UT, =V<. Our spatial and temporal assumptions lead us to the following 


approximated results regarding the steady-state: 


Pr{N,\~ 2 G3) 
Pr{T,}=@) (3.4) 
Pr{V,} = (1-2? \(1-@? )~1-2) - 6! (3.5) 


b. Markov Updates 


When about to make a decision regarding the next AC to investigate, the 
Recognizer first needs to estimate the probability of a target and the probability of a 
neutral at each of the ACs in the AOI at the time it would arrive to the designated AC. 
Let #' denote the time a decision is made and z’ the probability vector of a neutral in each 
AC at time ¢’. For any time ft >t’ we can calculate the new probability vector due to the 
progress of time, absent any new information about the AOI (i.e., no sensor 
observations). Let z‘ denote this new probability vector: 
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n= #(t-1',2") WES?’ (3.6) 
where a(t ise) is the function which updates the probability vector of a neutral at 


each AC. Note that this function does not depend on both time arguments but only on 
their difference. This function appears in several other places in this thesis. The trivial 


case is when no time had passed (i.e., t=?’ ), in which case there is no update and so we 
get that z' = #(t-1',2" ) =z. Also note that initially, absent any sensor information, 
the system is in steady-state and therefore z' = a(t —t', x) =z" forall t>t'. We need 


to distinguish between four non-trivial cases: (i) single time-step anda eé E, (ii) two or 
more time-steps anda eé E, (ili) single time-step anda ¢ E, (iv) two or more time-steps 


anda¢E: 


1-(1-a,)[](I-2)P(a',a)), t=t'+laceE 





aceA 
1-T][(1-2)-P(a',a)), t=t'+la¢E 
aeA (3.7) 
t-t'-1 
= I-(-a,){ [I-21 (o')) | (l-a,P* (.4))| ’ t>t'+2,aceE 
aceA leE k=l 
1-( [ae co.) TTT 0-2 (.2))}, t>t'+2,a¢E 
acA leE k=l 
nr, t=t' 


a 


where P* is the neutrals’ Markov transition probabilities matrix raised to the k power. 


Similarly we define 6! = 0, (1 —t', 0") for the targets: 


G, = 6,(t-1',0") = 
1-(1-B,)[](1-@Q(a',a)), t=t'+LacE 
deA 
1-[](1-40(@.a)), t=t'+la¢E 
aeA we (3.8) 
= 1-0-4) [1 t-a0" (oa) (1- 6,0" (.4))} t>t'+2,acE 
aeA leE k=l 
1- TI (1-50"" (a) TTT ~ B,0% (.0))} t>1'+2,a¢E 
adeA leE k=l 
O., t=t' 


c. Recognizer Updates 


In this section, we discuss the probability updates generated by the 
Recognizer while detecting and tracking an object. These updates eventually determine 


whether or not the Recognizer calls in the Interceptor to intercept the suspicious object. 


Recall that the Recognizer has perfect detection capabilities (i.e., detecting 
whether an AC is empty or not). Let x €.A denote the AC the Recognizer has decided to 
investigate next. If the Recognizer arrives at time ¢ in AC xand finds no object there, 
then the current decision (search AC x at time t) is fathomed and a new decision needs to 


be made. If an object is detected in AC x, then the immediate probabilities update are: 


t 
1 
ee eee (3.9) 
+E. 


gre oe jag (3.10) 
a+. 


where °°“ and 6°" are the probabilities of the detected object being a neutral or a 


target, respectively, and zr‘ and @’ are the corresponding prior probabilities. 
g p y. x fe p gp Pp 


If an object is detected in AC xat time rf, the Recognizer continues 
tracking the object for one more time period (f+1) in which the tracked object either 
moves to AC j¢A, leaves the AOI to ACA, or stays at AC x. Note that the object 
might stay at AC xor leave the AOI to.A,, according to the specific Markov transition 
matrices P and Q. If the object has left the AOI, the Recognizer ends up back in AC x at 
the end of the tracking time period and the decision is fathomed. After tracking the 
detected object (assuming it did not leave the AOD), the Recognizer decides if it is likely 
to be a neutral or a target. While tracking the object, the Recognizer only focuses on the 


tracked object and is not searching in other ACs. 


During tracking, the Recognizer utilizes two modes for recognizing the 
tracked object: signature recognition (e.g., electro-optical sensor, radar, etc) and 
movement recognition, in which the Recognizer tries to identify the movement pattern of 


the tracked object (1.e., leaving known shipping lanes or any other suspicious movement). 
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Both recognition techniques take place within the tracking time period. Without loss of 


generality, we assume that signature recognition takes place first. 


During the signature recognition mode, the Recognizer takes g looks 
(glimpses) at the tracked object where the glimpses are conditionally independent given 
the presence of the object. Let 1—u denote the single glimpse false negative probability 
of the Recognizer identifying a target as a neutral, and let 1—v denote the single glimpse 
false positive probability of the Recognizer identifying a neutral as a target. Suppose that 


n glimpses report “neutral” and g—n glimpses report “target.” Recall that j <A is the 
AC to which the tracked object has moved while being tracked. Let z‘*'* denote the 
posterior probability of a neutral following the g glimpses of the signature recognition 


process. We have: 


x 


(«)" (1 _ ve gine 





ais “OV; aoe SEER ae Leia — a AN ARO oe Dor (3.11) 
[so (l-v)i "ah" + at —u) (uy "ae?" 
and similarly, the posterior probability of a target is: 
s\(1— uw)" (u)*" gre 
t+1,Sig __ (*}( % = t+1,Si. 
00S = =l-z°"" (3.12) 


n 


[F}»"(a-v)* om ( (1—u)' (wy or ' 


Note that the factors f ] cancel out. 


Finally, the movement recognition mode takes into account the fact that 
the object has moved from ACxto ACJ. Taking the posteriors of the signature 


recognition mode as priors for the movement recognition mode we obtain the probability 





of neutral: 
t+1,Rec Pog j) Be 
P(x, j)ay "+ Q(x, /)8; 
Similarly, the posterior probability of a target is: 
x, . git) Sig 
gre — Cre = : Ol “) u apo =. as (3.14) 
P(x, fj)" + O(x, j) 8," 


where we introduce 6*” as a short-hand notation for , which we use later. 
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t+1,Rec 
6; 


Once tracking is complete, the Recognizer compares the probability in 
(3.14) to a predetermined threshold M and decides whether to flag the tracked object as a 
target or to let it go and decide on the next AC to visit. If this threshold is met, the 
Interceptor is called for interception while the Recognizer keeps watching the object until 


the arrival of the Interceptor. 
2. The State-Transition Function 


We recall that the state-transition s” (s,x,w) is a deterministic function of the 


current states, of the decision to visit AC x, and of the information revealed once x is 


fathomed. We define the state-transition function in the following way: 


sY:SxAxW>S (3.15) 
t+At,, 


s“(s,x,w)=| 4, (3.16) 


’ We 
where w=(At,,,7, bee) is the information, s =(t,7,1,2,0) is the current state, and 


w? 


a” and @™” are components of s” representing the probability vectors 7 and @ of the 


next state. In computing 2” and@” we use (3.7) and (3.8). There are three periods of 
time we potentially need to account for (Figure 4): (i) the time between making the 
decision to go to AC xand the time of arrival tox, (ii) the tracking time of the detected 


object, and (iii) the waiting time for the Interceptor to arrive and intercept the object. 
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Making a decision Arrival to AC X , no object detected, 
decision is fathomed, new state 


R 
t+T,. =t+At, 
Case 1: No object detected in AC x 


Making a decision Arrivalto AC X , End of tracking, decision is 
object detected fathomed, new state 


R R 
t teh. tT SISter At, 
Case 2: Object detected in AC x , but it is not flagged as a likely target 


Making a decision Arrival to AC X , End of tracking Interception, decision is 
object detected fathomed, new state 


R R 
t a aa ee hae | t+ At, 
Case 3: Object detected in AC x , and it is flagged as a likely target 





Figure 4. Timeline of state transitions 


For any decision x € A, the time the Recognizer arrive at AC xis t+ Te . Let 7 


and @ denote the resulting probability vectors of a neutral and a target, respectively, in 


each AC in the AOI at time t+T... For each vector component corresponding to an 


ACae Awe get: 
re #(T..7), a#tx 
t, = (3.17) 
0, a=x 
~ 6(T;",0), atx 
d= (3.18) 
0, a= % 


where we set 7#,and 0, to 0 because of two possible reasons: either no object has been 
detected in AC x, or an object was detected but because it will be tracked and handled 
separately — we can now assume the probability of another object in AC x is 0. 

We now need to account for the time of tracking a detected object, if one had 
indeed been detected. Let 7/ denote an operator that operate on a probability vector (either 


z or 0). The operator 7{ updates the probability vector it operates on to the appropriate 


value one time step later, and then set to 0 the probability component of that vector which 
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corresponds to the current Recognizer location. Using the operator#/{on a probability 
vector produces another probability vector. Let(Hz)_ denote the component of the 


resulting probability vector #17 corresponding to AC ae .A. We can formally define the 


operation of 1 in the following way: 


2 #(1, i), if a is not the AC to which the tracked object moved 
(Hz), = ae . (3.19) 
% 0, if ais the AC to which the tracked object moved 
as 6 1, @ ; if a is not the AC to which the tracked object moved 
(Hd) = (19) ; (3.20) 
¢ 0, if ais the AC to which the tracked object moved 


Recall that the duration of a tracking phase is always a single time step. When 
arriving at ACr, we need to have probabilities update similarly to the one we had for the 
time it takes the Recognizer to reach AC x. If the tracked object is eventually flagged as a 
likely target, we need to have a similar probability update for each time step until the 
decision is fathomed. For both cases when the tracked object is flagged as a likely target 


and when it is not flagged as a likely target, the overall remaining time between the time 
the Recognizer reaches AC x and when the decision is ultimately fathomed, is Ar, -T* : 
For each time step until the decision is fathomed, we need to update the probabilities of a 
target and a neutral at each AC using H. We can now explicitly write 7" and@” : 


mm -H# (3.21) 
6" =H'O (3.22) 


where H‘z denotes operating H for k times on the probability vector 7, H‘@ denotes 


operating H for k times on the probability vector 6, and we use k = At, —T,".. In the 


case that k = Ar, -T.“. =0, the operator H' is the identity operator. 
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3; Information Probability Mass Function 


We need the probability mass function of the obtained information w for several 
reasons. Firstly, in order to calculate the Bellman equation for the Original 
Problem B (2.5), we need to compute the expected value (with respect to w ) of the sum of 
the reward and the value of the next state. As discussed earlier (Chapter II.B), such a 
direct calculation is intractable, and so the approach taken in this thesis is to approximate 
the expected value using lower and upper bounds. Still, we present the formulas for direct 
calculation of the Original Problem’s Bellman equation in Chapter II.A.4. Another 
reason we need the probability mass function of wis for the simulative calculation of 
Bellman equation using our suggested heuristic, as later discussed in Chapter III-C. In 
this simulative calculation we need to generate multiple realizations of the obtained 
information w according to its probability mass function. Furthermore, the following 
derivation of the probability mass function of w serves as the basis for the derivation of 


the probability mass function of the obtained information w in the Upper-Bound 
Problem B . 
Recall that the information vector w describes the tactical consequences of a 


decision to visit a certain AC x: the time until the decision is fathomed, the locations of 


the Recognizer and Interceptor when this happens and whether or not a target has been 


eventually intercepted. While w= (At ial, } is an end-result, it can be equivalently 
described in terms of the events that occur during Afr. Following a decision x there are 
five possible scenarios which may occur: (1) no object is detected in AC x, (ii) an object 
is detected in AC xand it leaves the AOI (moves to.A,) while being tracked, (iii) an 
object is detected at AC x, moves to AC j € A and is flagged as a neutral, (iv) an object 
is detected at AC x, moves to AC j <A, is flagged as a target, and when intercepted is 


identified as a neutral, and (v) an object is detected at ACx, moves to AC je A, is 


flagged as a target, and when intercepted, it is confirmed as a target. 
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Based on these five possible scenarios, we can explicitly write the possible 


values w = (Ar rl 


w?"w? 


Uy 
2%) can take: 


(2000) F if no object is detected in AC x 
At. (7 + 1,x,1,0) , if an object is detected in AC x, then moves to A, 
r, (7 Regia a} if an object is detected in AC x, moves to AC j € A and (3.23) 
= st o) ’ 1, ’ 
re is flagged as a likely neutral 


7 é ; ..,\ if an object is detected in AC x, moves to AC j € A, flagged 
(7 +1+7;,.,5,0) : ; 
as a likely target, intercepted but turns out to be a neutral 
; , ...\ ifan object is detected in AC x, moves to AC j € A, flagged 
(Ti +1+T;,./,5,1) » - 
as a likely target, intercepted and is indeed a target 


To derive the probability mass function of w we need to derive the probability of 
each of the above scenarios. To do that, we first define two new random variables: 


-1, if no object is detected in AC x 
d=, 0, if an object is detected in AC x and while being tracked it moves to A, (3.24) 
J,  if.an object is detected in AC x and while being tracked it moves to AC j ¢ A 
-l, if no object is detected in AC x (and so nothing is flagged) 
f=) 9, if an object is detected in AC x and after tracking it is not flagged as a likely target (3.25) 
1, if an object is detected in AC x and after tracking it is flagged as a likely target 


Note that f =0 can either imply that the tracked object is flagged as a likely 
neutral or that it has left the AOI and so it is not flagged at all. We also recall (see 
Chapter III.A.1.a) that T, denotes the event of AC xcontaining a target at the time the 
Recognizer had reached AC x, and that N, denotes the event of AC xcontaining a 


neutral at the time the Recognizer reaches AC x. Rewriting (3.23) we get: 


(780). FSGS (1) 
ar) |(TX+hxi0),  — f =0,d=0 (1) 
i =4(7%, +1 j,i,0), $4=5 Vie A (un) 29) 
. (72. 41+7),,340),f=bd=),N, eA) 

(TR +147), ji), f=bd=)T, Wed) 
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Next we compute the probability of each of the event on the right-hand side of 
(3.26). 
The first event (I ) is the simplest to compute (note that f =-1<d=-1): 
Pr{f =-1,d =-1}=1-#,(7,.,2)-6,(7%.,0) (3.27) 


where s =(t,r,i,,0)'is the current state. 


The second event (I ) is also relatively simple to compute (note 


thatd =0=> f =0): 





Pr{ f =0,d =0} =Pr{d =0} = P(x,A,)@(T..2)+O(x,A,)O(T4..8) (3.28) 


To compute the probabilities of the other three events in (3.26), we first recall that 
@** denotes the posterior probability, at the end of the tracking phase, that the tracked 
object is a target. Also recall that a tracked object is flagged as a target if 0°° >M, 


where M is the flagging threshold. 


We defer the computation of event (///) after we compute events (UV) and (V). 
The following derivation is true for all j¢A: 
Prif =ld=7,N,} 
=Pr{d=j,N,}Pr{f =1ld = j,N,} 


=Pr{d = j,N,}Pr{0** >M Id = j.N,} (3.29) 


=Pr{d= j,N,} Pr{or >M |d=j,N.,n=n'\Pr{n=n'ld=j,N,} 


n'=0 
where gis the total number of glimpses the Recognizer takes while tracking an object, 
and n< gis the number of glimpses which indicated the tracked object being a neutral. 
Note that for every j ¢.A, we can calculate the maximal value of n for which ee >M. 


Let n, denote this value ofn . This means that for every j € A: 


iB ifn'<n, 
Pr{O > M Id = j,n=n'\ = ‘ 
0, ifn’ >n, 


(3.30) 


Zi 


Note that the value of n, depends on the specific AC j to which the tracked object 
moves, because 0“ takes into account movement recognition (i.e., an object moving to 
AC j may result in flagging it as a neutral while the same object moving to a different 
AC j’ may result in flagging it as a target, even if in both cases the same number of 


glimpses report “neutral’’). 


Continuing our derivation from (3.29) we get: 





Pr{d = j.N,} Do Pr{ar >M d= j,N,,n=n'}Pr{n=n'ld = j,N,}= 
ae (3.31) 


=Pr{d=j,N,}> Prin=n'ld=j,N,} 


n'=0 
Using Bayes’ formula for the first multiplicative term on the right-hand-side of 
(3.31) we get: 
Pr{d = j,N,}oPr{n=n'ld = j,N,} =Pr{d = jIN,}Pr{N,} > Pr{n=n'ld = j,N,} (3.32) 
The right-hand-side of (3.32) can be explicitly expressed in the following way: 


nj 


Pr{d = ji NYPr{N,) SPrfn =n'ld = j,N,} = P(x, f)#, (7,7) > ((*}" (1 -v)*") (3.33) 
n'=0 n'=0 
Summarizing (3.29) - (3.33) we get that for every j <A: 


Pr{f =1,d = j.N,}= P(x /)4,(T2.2)> (sv G-v)"") (3.34) 


n'=0 


Following a very similar derivation we get the probability of event(V ) : 


Pr{f = 1d = j,7,}= (x, /)6, (7.0) >1((*]@-u)" ("") (3.35) 


n'=0 
The derivation of the probability of event (III ) is similar to(I V) and(V ) , with two 
key differences: We need to consider both7\ and NV, events, and we are interested in the 


cases where the number of glimpses reporting “neutral” is high enough to result in 


flagging the tracked object as a neutral. This means that in the summation over the 
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possible number of glimpses reporting “neutral” we sum from n; +1 to g. We get that 


for every j¢A: 
Bey =0nt= 7} = 


= P(x, /)#,(TA,7) » ((s]>" -»)""} +O, 1) 6, (75,9) > ((:]@-u)" 


41 n'=n +1 


(3.36) 


a 


Combining (3.26) with the probabilities calculated in (3.27), (3.28), (3.34), (3.35) 


and (3.36) we can completely present the probability mass function of w: 


Ap) (Pe 
 fal* |lat-a 6 3.37) 
Us i | #, (7.7) 0(75,9) ( 
Zy 0 
At,, Ti 
Pr{| "|= ‘ = P(x,A,)4(T4.,2)+O(xA) (7.0) (3.38) 
L, i 
Bi ND +P(x.x)4,(Th.2) > ((s}» (1-v)° 
+0(x,x)6, (7.0) > ((s(1-uy" we) 
At,) (Ti. +1 
: i 3.39 
nfe pe feemewenglisyer] 
; j aan Ae : 
Vie Alx +9(x j)8,(T.8) Dr ‘Je-m a | 
At) (tlt, 
Ba ele =P A Mor E(t") G40 
qy 0 
ViEeA 
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Pr : = = 2(x./)8, (75,0) (3-0) )") (3.41) 
ee 1 
VjeA 


while the probability that w will get any other value than those which appear above is 0. 


4. Bellman Equation Calculation 


Finding the optimal decision policy for the Original Problem B requires 


calculating the value V(s) of all states s = (t, r,i, 7,0)! using Bellman equation (2.5). 
The trivial case is whent => T, which results in V (s) = (0. Computing the value of a state 


when ¢<T requires calculating the maximal expected sum of the reward obtained 
between the time the current decision is made and the time that decision is fathomed, and 
the value of the next state. Calculating this maximal value is attained by total 
enumeration of all feasible decisions. In the following section, we discuss the calculation 
of this expected sum of reward and value of the next state. Recall that the expected value 


of a sum is always equal to the sum of the expected values: 


V(s) 7 mee {E[ e(w,s) +V(s" (s,x,w)) |} = max {E[e(w,s)] + E|V(s" (s,x,w)) ]} (3.42) 


where: 
E|c(w,s)] = laa = w't (3.43) 
E[V(s™( (s,x,w) )) |= DVv(s"( S,X,W ))Pr{w=w't (3.44) 


Recall that c(w,s) is actually only a function of the Af, and z,, components of 


the information random vector w (2.3), and so for the calculation of the first expected 
value we only need _ the joint probability distribution of these two random variables, and 
not the complete joint probability distribution of all four components of w. Similarly, 


r, and i, components of w (3.16), and so 


w? “w 


sg” (s,x, w) is actually a function of just Ar 


for the calculation of the second expected value we only need the joint probability 
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distribution of these three random variables and not the complete joint probability 
distribution of all four components of w. We define two new functions which represent 


these dependencies: 


é(Ar,,z,.5)=c(w,s) Vs eS,Vw=(At,,r,,i,,2,)'€W (3.45) 


o- (s:x,At rst) =<" Gra w) VseS,VxeA,VwH= (At,,.7, ins Zy)'€ W (3.46) 


w? 





Using this functions we can now rewrite the expected values from (3.43) and 
(3.44): 
1 ce 
E[e(w, s)] = E[é(Ar,, ae s)] =e ( At ze Prt At At saa (3.47) 
Zy=0 At! =0 
E|Vv(s" (s,x,w)) | = 


7 E|v(s" (s,x,At,.7,,i,)) | = y > dv (5" (ear )Pr{Ar, = At’ .r, = Pai. = i'} 


At, =0 rc A i,eA 


(3.48) 


The joint probabilities appearing in (3.47) and (3.48) are marginal probabilities of 


the full probability mass function of w = (At,,, Lee a ) : 


w? 





PriAre= Ar z= 2. )= >. Pri Arar rt =1,2,=2,) , <GA9) 


TEA i,eA 





Ww? w w?"w w? 


1 
Prat aw aria l=) Bra aAy har ac, =<} 90) 
zi,=0 


Using (3.50) and the probability mass function of win (3.37)-(3.41) we can 
explicitly present the expected value in (3.48): 
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E[V(s™ (s,x,w))]=(Z)() + > (IV) + > ((V) (vr) + (Viz) (VT) 


jeA jeA 


(3.51) 





Similarly, we can use (3.49) and the probability mass function of w to explicitly 


present the expect value in (3.47) (recall that the reward is non-zero only when z,, =1): 
E Le(w, s) | - 
n, (3552) 


= 6, (7,0) (ter) O(x, A) [8](1—w)" (wh 


JEA n'=0 








Calculating the value of the initial state according to the Bellman equation (2.5) 
using our results in (3.51) and (3.52), requires going over all combinations of states, 


decisions and possible information realizations. Recall that the state is defined as 
s= (t, r,i,7,0) . Though the two probability vectors @ and z can theoretically take any 
value in the continuous interval [0,1] , their values are determined by the history of the 
Recognizer location. The number of different paths the Recognizer can take in the time 


T+2 


horizon T is |Al’, and therefore the state space size is |S|=7-|A]-|A]-|Al =T.|Al. 





Examination of the information w Probability Mass Function in (3.37) - (3.41) shows 


that for every state s and decision x, the size of the information space is 
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|W] =3-|A]+ 2 (by counting the number of possible values of w in the explicit PMF 


formulas). Combining all together, calculating Bellman equation in the Original Problem 


requires going over all combinations of state, decision and information realization, which 


T+3 


results in a run time of o(T-|A -(3-|A|+2)} for the backward recursion dynamic 


programming algorithm (see Powell [8], p. 50). This shows why the Original Problem is 
intractable for realistic situations with more than a few ACs and time periods. As an 
example, even a small scenario with 10 ACs and time horizon T =5, results in 16 billion 


calculations. 
B. UPPER BOUND PROBLEM 


The Upper Bound Problem B is very similar to the Original ProblemB: A 
decision is defined in the same way (though the two problems have different optimal 
decision policies). The information in B is defined in the same way as in B, though its 
probability distribution is different. The state transition functions of the two problems are 
closely related. The rewards in the two problems are practically the same, except that we 
formally use different functional notation because they operate on different state spaces. 
Lastly, The Bellman equations of the two problems appear to be almost identical, but 
they use slightly different variables as presented earlier in (2.9). In the following sections, 
we formally define the Upper Bound Problem B and discuss the differences with respect 


to the Original Problem B . 
1 States 


Recall that we previously defined a state in B as c=(671)' where f¢ is the 


time, 7 is the Recognizer’s location and 7 is the Interceptor’s location (2.8). The space 


of all possible states is denoted S . 
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2. Decision 


A decisionx € Ais the next AC to be visited by the Recognizer. The decision 


in B is defined in the same way as in the Original Problem B . 


3. Information 


_ = ’ : . ; 
Let the random vector w= (Az, Ediss Z,,) denote the information obtained when 


a decision is fathomed in the Upper Bound Problem B. The definition of each of these 
four vector components is exactly the same as in the Original Problem B , but because 
each of these random variables has different probability mass function than in the 
Original Problem B, we use different names for these random variables. The space of 
possible information values W is the same for Band forB (though the probability of 


obtaining each realization from these possible values is different). 
4. State Transition 


The state transition function in the Upper Bound Problem differs from that in the 
Original Problem by the fact that it does not include the decision x as an argument. Recall 
that the state transition function in the Original Problem requires x as an argument in 
order to calculate the new state’s probability vectors 7“ and 6” , which are not a part of 
the state definition in the Upper Bound Problem, and as such they do not need to be 
calculated. The decision x is of course a key factor in determining the next state, but it 


does so indirectly by influencing the obtained information w(s, x ) : 


o”:Sxwos (3.53) 
t+At, 
5" (5,0(5,x))=| 7, (3.54) 


ons —_ A — —. = eee ts . . . 
where 5 = (7 Tal ) is the state and w= (Az, Tr Z,) is the obtained information. 


w? w? Ly» 
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5. Reward 


The reward c is a function of the information w and the state 5, and is defined as 


follow: 








c:WxS>R (3.55) 
=a -(7+A1,,) = rae 
kK (1+7) ; ihe (3.56) 








ot t 


t ick Z,) = (t, Fi)! and T is the time horizon used in the scenario. 


W 


where w= (A 


This definition is practically the same as in the Original ProblemB, just using the 


appropriate Upper Bound Problem’s variables. 
6. Bellman Equation 


We define the value of being in state 5s as the maximum expected cumulative 
reward that can be obtained from the time of being in state s to the time horizon 7’. This 


value V (5) is given by Bellman equation as presented earlier (2.9): 





7(5) =| mR LEC) +7 (E" (5) 1<? @sn 
; r>T 


where f is the time of state 5 = (t, mel : 


hi Bellman Equation Calculation 


The formulas used for the calculation of a state’s value in the Upper Bound 
Problem B are similar to those in the Original Problem B. We use the same formulas for 
the two expected value terms in the Original Problem Bellman equation, appearing in 
(3.51) and in (3.52), when we replace the current state’s 9 and z by the steady-state 
O° and z°, with the probability components for both a target and a neutral at the current 


Recognizer’s location 7 set to 0. Let 9° and 7° denote these updated probability vectors: 


0 = 
@° |e as (3.58) 
a 


a 
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0 = 
7 = ‘ PE (3.59) 
a=Pr 


Substituting 9 and z with (3.58) and (3.59) in (3.51) and (3.52), while explicitly 
computing the next state using the state transition function in (3.54), we get the following 


formulas for computing the Bellman equation in the Upper Bound Problem: 


(7+A7,.7,.4))]=()(a)+D((anav)) +O ((v) v0) + (va) var) 


jeA jeA 


R a 


v(5" =(F+75 +147/,,3,/)] 














A (rT ker! ) y ce kd (3.61) 
. (ri EO = Na O(x AD [sIA-w) (u)° | 





As in the Original Problem, computing the value of the initial state requires going 
over all combinations of state, decision and information realization. The main difference 


from the corresponding calculation in the Original Problem is the much smaller size of 


the state space Is | =T-|A * due to the fact that the probability vectors z and @ are nota 





part of the Upper Bound Problem state s =(trr)" This means that the run time of 


calculating the value of the initial state in the Upper Bound Problem is 
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o(r Al -(3-|A]+ 2)) , which is much faster than the Original Problem run time, which 


is o(r |Ay” -(3-|A]+ 2)) (as discussed in Chapter III.A.4). 

When there is a small number of possible transitions from each AC in the AOI 
(i.e., once in a specific AC, an object can only move to a small number of neighboring 
ACs), there is a way to improve the run time of this algorithm. This is accomplished by 
calculating the possible realizations of the information w only for those ACs which have 
a non-zero probability the object has moved to. Let the set of ACs reachable from AC a 


in a single time step transition be denoted as the forward star of AC a (for all ACs 


aeéA). Let w denote the maximal of all ACs’ forward star sizes. The results run time of 


o(r : Al (3 “+ 2)) is indeed an improvement when p< A] ; 


C; REWARD CALCULATION USING HEURISTIC’S POLICY 


Theoretically, calculating the reward collected following the heuristic decision 
policy could be done in a similar way to the backward solution algorithm for solving the 
Bellman equation in the Original Problem discussed in Chapter HI.A.4, using the 
formulas for the expected values terms in (3.51) and (3.52). This calculation is simpler 
than calculating the collected reward following the optimal decision policy, because we 
do not need to compare the possible future rewards given every possible decision at each 
state, but only the future state resulting from making a decision based on our heuristic. 
Nevertheless, this simpler calculation is still intractable as the state space is still very 


large (we still need to solve for each state in the Original Problem state space S ). 


The method used in this thesis is to estimate the value of each state following the 
heuristic decision policy using a Monte-Carlo simulation instead of an exact calculation 
as discussed in the above paragraph. Instead of directly calculating the expected value in 
the Bellman equation, we simulate the scenario while randomizing the required 


realizations of the obtained information w using its known probability mass function. We 
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continue to generate these single-runs while keeping track of the mean and variance of 
the collected rewards until the confidence interval for the expected reward is sufficiently 


small. 


Figures 5 through 7 demonstrate three states taken from a heuristic calculation 
run. Each figure presents the probability map for neutrals at each AC in the AOI (z) and 
the probability map for targets at each AC in the AOI (@), together with the decision x , 


the Recognizer locationr at the time of the decision, and the Recognizer location r,, at 


the time the decision has been fathomed. Figure 5 shows the state at time t=0, which 
represents the steady-state. The Recognizer initial location is AC #18, it decides to visit 
AC #13 which turns out to be empty, and so the Recognizer eventually ends up in AC 
#13. Figure 6 shows the state at time t =1, at which the Recognizer location is AC #13 
(where it ended up in the previous state transition). Note that both the probability of a 
target and the probability of a neutral in AC #13 were set to 0 (as discussed in Chapter 
III.A.2). This time the Recognizer decides to visit AC #17, which also turns out to be 
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Figure 5. A state example (t =0) 
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Figure 6. A state example (t =1) 
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Figure 7. A state example (t = 18) 


Figure 7 shows the state at time t=18, at which the Recognizer location is AC 
#9. Note that the overall probabilities of targets and neutrals in each AC in the AOI seem 
to be much lower than in the beginning of the run (i.e., in steady-state). This is due to the 
fact that many ACs has been visited and the corresponding probability components of 7 


and @ were set to 0 after each visit, and as time advanced low probabilities “propagated” 
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to the rest of the AOI according to the neutrals and targets Markov chains. The 
Recognizer has decided to visit AC #7, in which an object was detected and so it was 
tracked while moving to AC #6 (from this figure we cannot tell whether the object was 


flagged as a likely target or not, nor if it was eventually intercepted). 


Figure 8 demonstrates the result of a heuristic calculation run with time horizon 
t = 48. In the bottom part of the figure we can identify seven events of object detection, 
out of which three have not resulted in flagging the tracked object as a likely target, and 
so the Interceptor has not been called for interception (t = 0,13,17 ), while the remaining 
four detection events resulted in the tracked object being flagged as a likely target and so 
the Interceptor has been called in for interception (t =5,24,31,36). Out of these four 
interception attempts, we can identify two successful targets interception which resulted 
in collecting rewards at times t=9,33(the delay is due to the time it takes the 
Interception to reach the flagged object and intercept it). The two remaining intercepted 


objects were in fact neutrals. 





# Intercepted Targets 











1 1 
20 25 
Time periods 


Detection, Interceptor called [+--+ --:- rece tees \ | 


Detection, Interceptor not called \ Panesesiiy \ Se ae Neate i dieses tase 
No Detection} . a4 suing esol ee ss it |. 














20 25 30 36 40 45 
Time periods 





Figure 8. A heuristic run summary example with two intercepted targets 
(T =48) 
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IV. NUMERICAL CASE STUDY 


A. OVERVIEW 


In this chapter, we present a numerical case study that addresses the 
implementation of the different models, the specific scenarios, the data used, the results 
of the different runs, and the insights resulting from this analysis. The main scenario 
chosen to be analyzed is of a 25 ACs strait-like AOI, with slightly different movement 
patterns for neutrals and targets, and with a time horizon of 48 time steps (representing 12 
real-life hours). All scenarios were implemented and analyzed using MATLAB. All 
together, we have run seven different scenarios, including 47 Upper Bound Problem 
expected value calculations, 43 heuristic’s expected value estimations, and over 50 


MATLAB run hours. 
B. SPECIFIC SCENARIOS AND DATA 


The numerical case study presented in this chapter includes a baseline scenario, 
and several variations of this baseline scenario for sensitivity analysis and evaluation of 


the robustness of the baseline scenario’s results and insights. 
1. Baseline Scenario 


The baseline scenario represents a strait-like AOI, with land on the North and 
South edges of the AOI (i.e., no arrivals from or departures to the North and South of the 
AOI). The AOI Ais a 5-by-5 square grid of 25 total ACs (see Figure 9). Each AC 
represents a 5-by-5 nautical-miles area in real-life, with a total are of 625nm’. The 
boundary of the AOI, F, is the five ACs on the West edge of the AOI (ACs #1-5) and the 
five ACs on the East edge of the AOI (ACs #6-10) . AC #26 represents the area A) 


outside of the AOI. The single time-step probability of a neutral and the single time-step 
probability of a target arriving to each of the ACs in the boundary is 0.05 and 0.01, 
respectively. The Markov chains representing neutrals movement and targets movements 


are slightly different to represent an operational situation in which one has some 
4] 


intelligent or other prior knowledge about the differences in the expected movement of 
the two objects’ types. In a single time step, both types of objects can only move to the 
four immediate neighboring ACs, with no diagonal movement. This results in having a 
maximal forward star size 44=5. Generally speaking, we assume neutrals tend to move 
across the strait (West-East traffic), while targets tend to move perpendicular to the 
shipping lanes (North-South traffic). This distinction is not absolute, meaning that both a 
neutral object and a target can move to the exact same neighboring ACs, just with 
different probabilities (1.e., there is no feasible movement which is unique to either 


targets or neutrals). 
+M—— 45nm ———> 


Land 


+M®_— 25nm ——> 





Figure 9. The baseline scenario AOI 


For neutrals, the probability of moving on the West-East axis is always double 
than the probability of moving on the North-South axis. For targets, the situation is 
flipped, with double the probability of moving on the North-South axis than on the West- 
East axis. For each of the ACs in A, there is a 0.1 single time step probability of staying 


in the same AC. The probability of moving back from A, to any AC in A is 0, and an 
object currently in A, will remain there with probability 1 (A, is an absorbing state in 


the individual Markov movement process of each of the objects). Figure 10 shows 


examples of single time step transition probabilities for targets and for neutrals, in a 
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geographical manner, while Table 1 and Table 2 present the complete Markov transition 


probability matrices for both neutrals and targets. 


0.3 


All other ACs: 0.1 


0.15 0.3 





Figure 10. Partial Markov transition prob. of neutrals (right) and targets (left) 


Both the Recognizer and the Interceptor in this baseline scenario start in the 
middle of the AOI in AC #18. A single time step represents 15 minute in real-life. The 
operational time horizon used is 12 hours, and so we have a time horizon of 48 time 
steps. We assume the Interceptor has roughly the same velocity as the both neutrals and 
targets, which is one AC per time step (approximately 20 knots in real-life). The 
Recognizer velocity is assumed to be four times the velocity of the Interceptor and of the 
objects (approximately 80 knots in real-life). The Recognizer’s and Interceptor’s 
traveling times between each pair of ACs is calculated as following: 


TR e el) (4.1) 


velocity 


Tle a (4.2) 


velocity 


where R and J are the Recognizer’s and Interceptor’s velocities, respectively, 


velocity velocity 


and ||a—a'|| is the Euclidean metric representing the geographical distance between AC 


aand ACa’'. Traveling times between all pairs of ACs are presented in Table 3 and 
Table 4. Recognizer traveling times include detection time and Interceptor traveling times 


include boarding time. 
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Table 1. | Neutrals Markov transition probabilities matrix P 








Table 2. Targets Markov transition probabilities matrix Q 
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Table 3. Recognizer’s all AC pairs transition times 





Table 4. _ Interceptor’s all AC pairs transition times 
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The Recognizer’s sensor is assumed to take three glimpses at the tracked object 


during the single time step tracking phase ( g =3 ). The false positive and false negative 


detection probabilities of a target are both 0.2 (u=v=0.8). 


The discount factor used for discounting rewards is vy = 0.05, which means that a 
target intercepted at the end of the 12 hours time horizon has approximately \) of the 


operational value of a target intercepted at f=0. 


The value of the probability threshold M , which the Recognizer uses to flag a 
tracked object as either a likely target or a likely neutral, is systematically varied to 


examine its effects on the results. 


Using a MacBook Pro with Dual-Core 2.53GHz CPU and 4GB of RAM, 
computing the expected reward of the Upper Bound Problem with this scenario has a run 
time of approximately 30 minutes, while computing the expected reward following the 


heuristic decision policy with this scenario has a run time of approximately 6 minutes. 
2. Additional Scenarios 


To better evaluate our suggested heuristic’s performance and to gain a deeper 
insight into the nature of this problem, we perform a brief parametrical study and 
sensitivity analysis on several key parameters in the baseline scenario. This is 
accomplished by running and analyzing several other scenarios based on the baseline 
scenario. These scenarios include a zero-discounting scenario, a scenario with poor 
sensor capabilities, a scenario with extended boarding time for the Interceptor, a 48 hours 


time horizon scenario and an 1600nm? 8-by-8 AOI scenario. 
C. RESULTS AND INSIGHTS 


The following sections present the results of the numerical case studies with 
respect to the different scenarios examined, and discuss some insights derived from these 


results. The first section discusses the main question the numerical case study is intended 
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to answer: How well does the suggested heuristic perform? The subsequent sections 
discuss additional insights and interesting results regarding the MIM operational 


scenario. 
iF Performance of the Heuristic Decision Policy 


The key motivation for this numerical case study is to evaluate the performance of 
our suggested heuristic in the context of some representative scenarios. The main result is 
the gap between the expected reward (1.e., discounted expected number of intercepted 
targets) achieved by the heuristic decision policy and that achieved by the Upper Bound 
Problem optimal decision policy. Note that this gap is an upper bound on the true gap 
between the performance of the heuristic decision policy and the Original Problem 
optimal decision policy, and so the gap presented here is a “worst case scenario”. Table 5 
and Figure 11 present the results of both the heuristic and the Upper Bound Problem 
decision policies in the baseline scenario, using several different values as the probability 
threshold for deciding whether or not to call the Interceptor at the end of the tracking 
phase of a detected object. The error bars in Figure 11 (and later in Figures 12-14) 
represent the confidence intervals of the estimated expected reward following the 


heuristic policy. 


In this baseline scenario, the gap is about 30% on average for different values of 
the probability threshold M , with relatively little sensitivity to the choice of M . This 
means that using the heuristic decision policy results with at least ~70% of the Original 


Problem expected reward. 


Additional scenarios, presented in the subsequent sections, support the statement 
that the heuristic is useful in obtaining a simple and effective decision policy for the MIM 
operational scenario. By choosing the appropriate value for the probability threshold M , 
we get that all examined scenarios resulted in a gap of less than 40% between the 


heuristic expected reward and the optimal expected reward. 
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Table 5. Baseline scenario results 


Baseline scenario 
(gamma=0.05, V_R=80kts, V_l=20kts, T=12hrs, u=v=0.8, g=3) 
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Figure 11. Baseline scenario results 
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Ze No Discounting of Rewards 


This scenario is identical to the baseline scenario except we did not use any 


discounting (vy =0). 






































Table 6. | No-discounting scenario results 


No-discounting scenario 
(gamma=0, V_R=80kts, V_I=20kts, T=12hrs, u=v=0.8,g=3) 
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Figure 12. No-discounting scenario results 
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In this no-discounting scenario, the gap is slightly larger than in the baseline 
scenario, with about 40% gap on average for different values of the probability 
threshold M . This means that using the heuristic decision policy (in this no-discounting 
scenario) results in expected reward that is at least ~60% of that in the Original Problem. 
The shape of the graph in Figure 12 is very similar to the baseline scenario result, with 
relatively little sensitivity to the choice of M.A possible explanation to the slightly 
better results when using discounting over no-discounting is the “greedy” nature of the 
heuristic. A myopic approach, as used in this heuristic, makes more operational sense 
when there is higher operational value for intercepting targets in the near future than for 
targets intercepted later in the future. Nevertheless, even without any discounting at all, 
the myopic heuristic is useful, with only ~10% worse performance than with a 


discounting factor of vy =0.05 as used in the baseline scenario. 


3. Low Quality Signature Recognition 


In this scenario, we assumed the Recognizer sensor has poor signature 
recognition, meaning they can hardly tell between a neutral and a target based on the 
signature of the tracked object. The false positive and false negative detection 
probabilities of a target were assumed to both be 0.4 with u=v=0.6 (Note that 
u=v=0.5 is a useless sensor with no ability to identify the type of a tracked object). 
This scenario was run without discounting the intercepted targets (vy =0). As expected, 
the overall results (Table 7 and Figure 13) are worse than the corresponding scenario with 


better sensor quality. 


We can easily notice the effect of the poor sensor quality on the results of this run: 
Except when choosing to intercept every detected target (M =0), as we choose higher 
values for the probability threshold M we get worse and worse results, meaning less and 
less targets intercepted. This can be explained by the fact that with such poor quality 
signature recognition sensor, situations in which a tracked object has a probability of 
being a target of 0.2 and above is rare (remember that the probability of an object being a 


neutral prior to any sensing is higher than the probability of that object being a target, that 


50 


is because of the arrival probabilities of neutrals and targets to the AOI). This results in 


the fact that any threshold of 0.2 and above is rarely met. 


This situation of using poor sensors is obviously unwanted, but if there is no way 
to avoid it, a very low probability threshold is best. Using a high probability threshold in 
this scenario does not make any sense, and so we ignore the poor results of the heuristic 


in these cases (as bad as ~94% gap) when evaluating the overall heuristic performance. 






































Table 7. Poor signature recognition scenario results 


Low quality signature recognition scenario 
(gamma=0, V_R=80kts, V_I=20kts, T=12hrs, u=v=0.6,g=3) 
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Figure 13. Poor signature recognition scenario results 
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4. How Often Should We Call the Interceptor? 


Inspecting the results of the baseline scenario (Table 5 and Figure 11), we notice 
that the “best” probability threshold in the heuristic context is M =0.05. This threshold 
value results in an expected number of 0.76 intercepted targets in the Upper Bound 
Problem, which is practically the same as the optimal 0.77 targets. A threshold of 
M =0.05 also results in the lowest observed gap of 29.7%. All of the above can point to 
the fact that it is good to choose a value of M =0.05 when creating the CONOPS of an 
interdiction force under the assumptions of the baseline scenario. This result appears to 
be counter intuitive, as it appears intuitive to choose a higher threshold. It seems to make 
sense to choose a threshold of at least 0.5, meaning that we should call the Interceptor 
(and “waste” the time associated with the interception) only when it is at least more likely 
that a tracked object is a target than it is a neutral. The first guess of an effective 
threshold value, when this scenario was first implanted and tested, was in fact 
M =0.8(the motivation was to set the threshold high enough as to only call the 
Interceptor when it is most likely that a target will be eventually intercepted). The fact 
that such high probability threshold values are worse than much lower values as 
M =0.05 was initially surprising. To better understand these counterintuitive results, we 
evaluated several scenarios with longer interception times. In the baseline scenario, the 
time the interdiction force “pays” for calling in the interceptor over letting the tracked 
object go without intercepting it, is only the travel time of the Interceptor from its current 
location to the interception location. While speculating that this “time-penalty” for 
risking an interception is too low, we constructed and analyzed two new scenarios in 
which we artificially extended the time of interception by introducing a boarding time for 
the interception of objects. This means that every time the Interceptor is called in to 
intercept a likely target, the time it takes until the interception is complete is the sum of 
the travel time of the Interceptor to the interception location and the boarding time. A 
longer boarding time discourages calling the Interceptor “too often.” Table 8 and Figure 
14 compare the results of these three scenarios: the baseline scenario with 0 boarding 
time, a scenario with 5 time steps boarding time, and a scenario with 20 time steps 
boarding time. 
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Table 8. Sensitivity to boarding time (x marks scenarios which have not been 
calculated) 





The main motivation for the analysis of these scenarios is to examine the shape of 


the graphs, and so not all data points are computed for both scenarios. 
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Figure 14. Sensitivity to boarding time 
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The results of the scenario with five time steps boarding time are indeed more 
intuitive than the baseline scenario with 0 boarding time. Note the shape of the graphs in 
the two new scenarios, for both the Upper Bound expected reward and the heuristic 
expected reward, which suggests it is best to choose higher values for the probability 
threshold parameter M than in the baseline scenario. A threshold value of approximately 
0.2 seems to have the best results in the scenario with boarding times of five time steps, 
while a value of approximately 0.4 seems to be the best in the scenario with boarding 
times of 20 time steps. These results make sense since with longer overall interception 


time it is not that efficient to call the Interceptor too often. 
3s Extended Time Horizon 


All the scenarios presented so far had a time horizon of 12 hours (JT =48 time 
steps). To confirm the heuristic’s performance and insights presented earlier in this 
chapter, we examine a scenario with 24 hours time horizon (7 =96 time steps, with all 
other parameters as in the baseline scenario). The heuristic’s expected reward in this 
scenario is 0.57 and the Upper Bound expected reward is 0.85, with a gap of 33%. The 
heuristic performance in this scenario is very close to the observed performance in the 
baseline scenario with half the length of the time horizon. This encouraging result 
supports our statement that difference between the heuristic and optimal expected 


rewards is less than ~30%. 
6. 8-by-8 AOI 


Another scenario examined for confirming our results and insights is a scenario 
with a larger AOI: a 1600nm* AOI (8-by-8 ACs) instead of the 625nm? AOIs (5-by-5 
ACs) used in all previous scenarios. The number of ACs in this enlarged AOI is 


& =~ 2.5 times the number of ACs in the baseline scenario. Recall that the run time of the 
Upper Bound Problem backward calculation is o(r : JAP (3. M+ 2)) , which means that 


the running time for the Upper Bound calculation of this 8-by-8 scenario is 


approximately 16 times longer than the 5-by-5 scenario run time. As the baseline scenario 
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Upper Bound calculation takes approximately 30 minutes, this 8-by-8 scenario indeed 
takes approximately 8 hours. For this reason, we only examined a single run of this 8-by- 


8 scenario. The rest of the scenario parameters are as in the baseline scenario. 


The heuristic expected reward in this scenario is found to be 0.45, while the 
Upper Bound expected reward is 0.62, with a gap of 28%. The result of this scenario is in 
agreement with previously presented results and, therefore, supports our statements 


regarding the heuristic performance. 


a, 
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V. CONCLUSIONS AND RECOMMENDATIONS 


A. CONCLUSIONS 


The goal of this thesis is to develop a useful tactical decision aid for an 
interdiction force in a MIM scenario. We discuss the reasons why finding the exact 
optimal decision policy in this scenario is an intractable problem, and so we suggest a 


heuristic sub-optimal decision policy instead of the optimal one. 


Based on the analysis of several numerical case studies, we conclude that the 
number of intercepted targets following the heuristic decision policy is at least 60% of the 
number of targets intercepted following the optimal decision policy. This percentage 
improves to approximately 70% when discounting intercepted targets with a discount 


factor of 0.05 with respect to time steps of 15 minutes. 


Based on the observed heuristic performance in the numerical case study, we 
recommend the use of this heuristic in any MIM scenario which closely resembles the 
MIM scenario discussed in this thesis. While 40% is indisputably a significant gap, the 
heuristic decision policy calculation can be completed almost instantaneously while the 
true optimal decision policy is intractable to compute in any plausible operational 


situation. 


Furthermore, we gained additional insights regarding the effect that several 
operational and technical parameters of the interdiction force have on the expected 
outcome of the MIM. Such insights include how to better choose the probability 
threshold for intercepting a tracked object, and what performance should we expect from 
our heuristic and from any other feasible decision policy (including the optimal one) 


under different scenarios. 


This thesis is a first attempt to obtain an optimal operational policy for a 
synchronized sensor-interceptor maritime interdiction force or to quantitatively analyze a 


heuristic approach to this problem. The models, methodology and even the 


57 


implementation code developed in this thesis, can be easily applied to future research of 


this problem or similar ones, as briefly discussed in the next section. 
B. SUGGESTED WORK AHEAD 


There are several interesting research avenues to be extended from this thesis. 
New heuristic decision policies can be developed and evaluated with minimal 


modifications to the heuristic decision policy suggested in this thesis. 


Improved implementation of the models presented in this thesis with faster run 
time and more efficient memory use, can enable the analysis of bigger and more realistic 
operational scenarios that will reinforce our confidence in the heuristic performance 


presented in this research. 


Extensions of the scenarios and models in this thesis can include the introduction 
of multiple Recognizers and/or Interceptors, the optimization of the Interceptor location 
throughout the scenario (instead of waiting at its current location until called for by the 
Recognizer), or a more realistic interception model for the phase between the flagging of 


a tracked object as a likely target and the actual interception. 


Another extension of this thesis can be to introduce “Red-team intelligence”, 
meaning that we allow the targets to act strategically react to the interdiction force’s 
actions. Such possible reactions must be taken into account when optimizing the 


operational policy of the interdiction force, using game-theoretic approaches. 


An alternative approach to the numerical analysis of any suggested heuristic is to 


attempt to analytically prove an approximated sub-optimal decision policy. 
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APPENDIX. MATLAB IMPLEMENTATION OVERVIEW 


The implementation of all models and algorithms in this thesis was coded in 
MATAB 7 (R14). All runs were done on a MacBook Pro with an Intel Dual-Core 
2.53GHz CPU with 4GB RAM, on Windows XP Professional. 


The MATLAB code (Table 9) implements the heuristic expected value 
calculation (H) and the Upper Bound Problem expected value calculation (U). The code 


consists of two main run files and 13 supporting functions (some of which are used in 


both main run files). All together, there are approximately 2000 lines of code. 















































Heuristic_calc.m Heuristic expected value calculation (main run file) xX 
Upper_Bound_calc.m | Upper Bound Problem expected value calculation (main run file) 

cteate_AOLm Creates the AOI: Markov transition matrices, arrival probabilities and xX 

ACs trace (translation between ACs index and coordinates) 

travel_times.m Calculates all ACs pairs travel times for both Recognizer and Interceptor | X 
steady_state.m Calculates the targets and neutrals steady-state probability vectors x 
x_heuristic.m Calculates the current decision following the heuristic policy x 
w_realization.m Get a realization of the information according to the appropriate PMF x 
reward.m Calculates the obtained reward x 
sM.m Calculates the new state after decision is fathomed x 
pi_hat.m Targets probability vector update function Xx 
pi_hat_a.m Single AC target probability update function x 
theta_hat.m Neutrals probability vector update function x 
theta_hat_a.m Single AC neutral probability update function x 
theta_rec.m Calculates the probability for a target after the tracking phase x 
plot_prob_map.m Plots the probability vectors as probability color maps x 











Table 9. 


MATLAB code filenames and descriptions 
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